{
  "cells": [
    {
      "cell_type": "markdown",
      "id": "7f9d3587-a07f-4657-a57d-8d28619081d0",
      "metadata": {
        "id": "XIyP_0r6zuVc"
      },
      "source": [
        "<!-- Banner Image -->\n",
        "<img src=\"https://uohmivykqgnnbiouffke.supabase.co/storage/v1/object/public/landingpage/brevdevnotebooks.png\" width=\"100%\">\n",
        "\n",
        "<!-- Links -->\n",
        "<center>\n",
        "  <a href=\"https://brev.nvidia.com\" style=\"color: #06b6d4;\">Console</a> •\n",
        "  <a href=\"https://developer.nvidia.com/brev\" style=\"color: #06b6d4;\">Docs</a> •\n",
        "  <a href=\"/\" style=\"color: #06b6d4;\">Templates</a> •\n",
        "  <a href=\"https://discord.gg/NVDyv7TUgJ\" style=\"color: #06b6d4;\">Discord</a>\n",
        "</center>\n",
        "\n",
        "# Run Google's Gemma🤙\n",
        "\n",
        "Welcome!\n",
        "\n",
        "In this notebook and tutorial, we will download & run Google's new [Gemma 7B-parameter model](https://huggingface.co/google/gemma-7b). From the Hugging Face page: \n",
        "> Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.\n",
        "\n",
        "If you would like a fine-tuning guide, join our [Discord](https://discord.gg/y9428NwTh3) or message me on [X](https://x.com/harperscarroll) to let me know!"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "4c180393-2631-4e44-97bd-47b8cced2060",
      "metadata": {
        "id": "SSBw-KpkyRga"
      },
      "source": [
        "#### Before we begin: A note on OOM errors\n",
        "\n",
        "If you get an error like this: `OutOfMemoryError: CUDA out of memory`, tweak your parameters to make the model less computationally intensive. I will help guide you through that in this guide, and if you have any additional questions you can reach out on the [Discord channel](https://discord.gg/y9428NwTh3) or on [X](https://x.com/harperscarroll).\n",
        "\n",
        "To re-try after you tweak your parameters, open a Terminal ('Launcher' or '+' in the nav bar above -> Other -> Terminal) and run the command `nvidia-smi`. Then find the process ID `PID` under `Processes` and run the command `kill [PID]`. You will need to re-start your notebook from the beginning. (There may be a better way to do this... if so please do let me know!)\n",
        "\n",
        "### Let's begin!"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "ae808e1c-21bc-4b00-ae44-bbfd7d1ed82e",
      "metadata": {
        "id": "hWI-uRLEyRgb"
      },
      "source": [
        "## 1. Get & Set Up a GPU\n",
        "\n",
        "I used a GPU and dev environment from [brev.dev](https://developer.nvidia.com/brev). Click the badge below to get your preconfigured instance:\n",
        "\n",
        "[![](https://brev-assets.s3.us-west-1.amazonaws.com/nv-lb-dark.svg)](https://brev.nvidia.com/environment/new?instance=A10G:g5.xlarge&name=gemma&file=https://github.com/brevdev/notebooks/raw/main/gemma.ipynb&python=3.10&cuda=12.0.1)\n",
        "\n",
        "Once you've checked out your machine and landed in your instance page, select the specs you'd like (I used Python 3.10 and CUDA 12.0.1; these should be preconfigured for you if you use the badge above) and click the \"Build\" button to build your verb container. Give this a few minutes.\n",
        "\n",
        "A few minutes after your model has started Running, click the 'Notebook' button on the top right of your screen once it illuminates (you may need to refresh the screen). You will be taken to a Jupyter Lab environment, where you can upload this Notebook.\n",
        "\n",
        "Note: You can connect your cloud credits (AWS or GCP) by clicking \"Org: \" on the top right, and in the panel that slides over, click \"Connect AWS\" or \"Connect GCP\" under \"Connect your cloud\" and follow the instructions linked to attach your credentials."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "327a9d63-0dc6-49fa-ac93-550ff93ab4e6",
      "metadata": {},
      "source": [
        "Now, run the cell below (Shift + Enter when you've highlighted the cell) to import the required libraries onto the GPU."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "id": "72283f98-53ac-4111-abfe-f995c8e8500b",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "\n",
            "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.0\u001b[0m\n",
            "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
          ]
        }
      ],
      "source": [
        "!pip install -q -U torch torchvision torchaudio transformers jupyter ipywidgets accelerate"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "e51b1aa8-1602-4266-9af7-ff05ea060bc3",
      "metadata": {},
      "source": [
        "## 2. Download Gemma 7B\n",
        "\n",
        "Now that we've installed the necessary libraries, let's pull the BioMistral model from Hugging Face.\n",
        "\n",
        "First, request access to Gemma by clicking [here](https://huggingface.co/google/gemma-2b). Then, open a Terminal ('+' or 'Launcher' in the tabs above, then 'Terminal') and input `huggingface-cli login` and press Enter. Get a token by following the [link](https://huggingface.co/settings/tokens) (make sure you're logged in) and input it where you're prompted `Token:`. Note that for your safety/privacy, you won't see any text output when you paste the token into the Terminal shell.  \n",
        "\n",
        "Check the output. If you see red text, you then may have to run `git config --global credential.helper store` and then `huggingface-cli login` again.\n",
        "\n",
        "Once you see `Login successful`, you can run the cell below."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "f7347593-a776-4fd6-88bb-737e590fc2ac",
      "metadata": {},
      "outputs": [
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "5e5995247b6d4730957e487600d13256",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ],
      "source": [
        "import torch\n",
        "from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig\n",
        "\n",
        "model_id = \"google/gemma-7b\"\n",
        "# if you'd like, you can replace this with the instruct model (good for Q&A; like chatbot) by uncommenting the cell below\n",
        "# model_id = \"google/gemma-7b-it\"\n",
        "\n",
        "model = AutoModelForCausalLM.from_pretrained(\n",
        "    model_id, \n",
        "    token=True,\n",
        "    device_map=\"auto\",\n",
        "    trust_remote_code=True,\n",
        ")\n",
        "\n",
        "bnb_config = BitsAndBytesConfig(\n",
        "    load_in_4bit=True,\n",
        "    bnb_4bit_use_double_quant=True,\n",
        "    bnb_4bit_quant_type=\"nf4\",\n",
        "    bnb_4bit_compute_dtype=torch.bfloat16\n",
        ")\n",
        "\n",
        "eval_tokenizer = AutoTokenizer.from_pretrained(model_id, \n",
        "                                               quantization_config=bnb_config, \n",
        "                                               add_bos_token=True, \n",
        "                                               trust_remote_code=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "9e75ffa4-1d51-411b-b82a-3d7d54ccc98e",
      "metadata": {},
      "source": [
        "## 3. Run the Model! \n",
        "Replace `eval_prompt` with your prompt string below, and run the model!"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "7988e989-432c-4f98-8e1b-383fbfa1da3a",
      "metadata": {},
      "outputs": [],
      "source": [
        "eval_prompt = \"The best way to \"\n",
        "model_input = eval_tokenizer(eval_prompt, return_tensors=\"pt\").to(\"cuda\")\n",
        "\n",
        "model.eval()\n",
        "with torch.no_grad():\n",
        "    print(eval_tokenizer.decode(model.generate(**model_input, max_new_tokens=100, repetition_penalty=1.15)[0], skip_special_tokens=True))"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "30885538-b32e-4911-8a87-8023ecdaaf26",
      "metadata": {
        "id": "VCJnpZoayRgq"
      },
      "source": [
        "### Sweet... it worked! \n",
        "\n",
        "Enjoy playing around with the full version of Gemma 2B!\n",
        "\n",
        "One thing I want to share: note the difference between the standard and inference models!\n",
        "\n",
        "Given the input `\"The best way to \"`, the outputs are:\n",
        "\n",
        "**Standard:**\n",
        ">The best way to <strong>get rid of a skunk</strong> is by using a <strong>skunk repellent</strong>.\n",
        ">\n",
        ">Skunks are known for their foul-smelling spray, which can be quite unpleasant if you come into contact with it.\n",
        ">\n",
        ">If you have a skunk problem in your yard or garden, there are several ways to get rid of them without harming them.\n",
        ">\n",
        ">In this article, we will discuss the different methods that you can use to get rid of skunks and how to prevent them from coming back.\n",
        "\n",
        "**Instruct:**\n",
        ">The best way to <b>find the perfect apartment in [city/neighborhood]</b> is to:\n",
        "> \n",
        "> **1. Define your needs and preferences:**\n",
        "> * **Location:** Consider commute time, proximity to public transportation, and access to amenities like shopping, dining, and entertainment.\n",
        "> * **Size and layout:** Think about how many bedrooms you need, the amount of living space, and whether you prefer a studio or one-bedroom apartment.\n",
        "> * **Budget:** Determine your maximum monthly rent and be prepared to negotiate based\n",
        "\n",
        "See how one is more generating one-way content, i.e. an article (standard) and the other generates more of a chatbot response (inference)?\n",
        "\n",
        "You can also try prompting with full questions and see how both do. You'll find that the Instruct model is far more like a chatbot, providing you with useful answers right out of the box, without much prompt engineering.\n",
        "\n",
        "\n",
        "Anyway, I hope you enjoyed this tutorial on running Gemma 2B. If you have any questions or suggestions, feel free to reach out to me on [X](https://x.com/harperscarroll) or on the [Discord](https://discord.gg/T9bUNqMS8d).\n",
        "\n",
        "🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}
